32 research outputs found
A Review of Methods for the Analysis of the Expected Value of Information
Over recent years Value of Information analysis has become more widespread in
health-economic evaluations, specifically as a tool to perform Probabilistic
Sensitivity Analysis. This is largely due to methodological advancements
allowing for the fast computation of a typical summary known as the Expected
Value of Partial Perfect Information (EVPPI). A recent review discussed some
estimations method for calculating the EVPPI but as the research has been
active over the intervening years this review does not discuss some key
estimation methods. Therefore, this paper presents a comprehensive review of
these new methods. We begin by providing the technical details of these
computation methods. We then present a case study in order to compare the
estimation performance of these new methods. We conclude that the most recent
development based on non-parametric regression offers the best method for
calculating the EVPPI efficiently. This means that the EVPPI can now be used
practically in health economic evaluations, especially as all the methods are
developed in parallel with
A Bayesian partial identification approach to inferring the prevalence of accounting misconduct
This paper describes the use of flexible Bayesian regression models for
estimating a partially identified probability function. Our approach permits
efficient sensitivity analysis concerning the posterior impact of priors on the
partially identified component of the regression model. The new methodology is
illustrated on an important problem where only partially observed data is
available - inferring the prevalence of accounting misconduct among publicly
traded U.S. businesses
Counterfactual Learning with Multioutput Deep Kernels
In this paper, we address the challenge of performing counterfactual
inference with observational data via Bayesian nonparametric regression
adjustment, with a focus on high-dimensional settings featuring multiple
actions and multiple correlated outcomes. We present a general class of
counterfactual multi-task deep kernels models that estimate causal effects and
learn policies proficiently thanks to their sample efficiency gains, while
scaling well with high dimensions. In the first part of the work, we rely on
Structural Causal Models (SCM) to formally introduce the setup and the problem
of identifying counterfactual quantities under observed confounding. We then
discuss the benefits of tackling the task of causal effects estimation via
stacked coregionalized Gaussian Processes and Deep Kernels. Finally, we
demonstrate the use of the proposed methods on simulated experiments that span
individual causal effects estimation, off-policy evaluation and optimization
Estimating the Expected Value of Sample Information across Different Sample Sizes using Moment Matching and Non-Linear Regression
Background: The Expected Value of Sample Information (EVSI) determines the
economic value of any future study with a specific design aimed at reducing
uncertainty in a health economic model. This has potential as a tool for trial
design; the cost and value of different designs could be compared to find the
trial with the greatest net benefit. However, despite recent developments, EVSI
analysis can be slow especially when optimising over a large number of
different designs. Methods: This paper develops a method to reduce the
computation time required to calculate the EVSI across different sample sizes.
Our method extends the moment matching approach to EVSI estimation to optimise
over different sample sizes for the underlying trial with a similar
computational cost to a single EVSI estimate. This extension calculates
posterior variances across the alternative sample sizes and then uses Bayesian
non-linear regression to calculate the EVSI. Results: A health economic model
developed to assess the cost-effectiveness of interventions for chronic pain
demonstrates that this EVSI calculation method is fast and accurate for
realistic models. This example also highlights how different trial designs can
be compared using the EVSI. Conclusion: The proposed estimation method is fast
and accurate when calculating the EVSI across different sample sizes. This will
allow researchers to realise the potential of using the EVSI to determine an
economically optimal trial design for reducing uncertainty in health economic
models. Limitations: Our method relies on some additional simulation, which can
be expensive in models with very large computational cost
Modelling Grocery Retail Topic Distributions: Evaluation, Interpretability and Stability
Understanding the shopping motivations behind market baskets has high
commercial value in the grocery retail industry. Analyzing shopping
transactions demands techniques that can cope with the volume and
dimensionality of grocery transactional data while keeping interpretable
outcomes. Latent Dirichlet Allocation (LDA) provides a suitable framework to
process grocery transactions and to discover a broad representation of
customers' shopping motivations. However, summarizing the posterior
distribution of an LDA model is challenging, while individual LDA draws may not
be coherent and cannot capture topic uncertainty. Moreover, the evaluation of
LDA models is dominated by model-fit measures which may not adequately capture
the qualitative aspects such as interpretability and stability of topics.
In this paper, we introduce clustering methodology that post-processes
posterior LDA draws to summarise the entire posterior distribution and identify
semantic modes represented as recurrent topics. Our approach is an alternative
to standard label-switching techniques and provides a single posterior summary
set of topics, as well as associated measures of uncertainty. Furthermore, we
establish a more holistic definition for model evaluation, which assesses topic
models based not only on their likelihood but also on their coherence,
distinctiveness and stability. By means of a survey, we set thresholds for the
interpretation of topic coherence and topic similarity in the domain of grocery
retail data. We demonstrate that the selection of recurrent topics through our
clustering methodology not only improves model likelihood but also outperforms
the qualitative aspects of LDA such as interpretability and stability. We
illustrate our methods on an example from a large UK supermarket chain.Comment: 20 pages, 9 figure
Interpretable Deep Causal Learning for Moderation Effects
In this extended abstract paper, we address the problem of interpretability and targeted regularization in causal machine learning models. In particular, we focus on the problem of estimating individual causal/treatment effects under observed confounders, which can be controlled for and moderate the effect of the treatment on the outcome of interest. Black-box ML models adjusted for the causal setting perform generally well in this task, but they lack interpretable output identifying the main drivers of treatment heterogeneity and their functional relationship. We propose a novel deep counterfactual learning architecture for estimating individual treatment effects that can simultaneously: i) convey targeted regularization on, and produce quantify uncertainty around the quantity of interest (i.e., the Conditional Average Treatment Effect); ii) disentangle baseline prognostic and moderating effects of the covariates and output interpretable score functions describing their relationship with the outcome. Finally, we demonstrate the use of the method via a simple simulated experiment
Mixture polarization in inter-rater agreement analysis: a Bayesian nonparametric index
In several observational contexts where different raters evaluate a set of
items, it is common to assume that all raters draw their scores from the same
underlying distribution. However, a plenty of scientific works have evidenced
the relevance of individual variability in different type of rating tasks. To
address this issue the intra-class correlation coefficient (ICC) has been used
as a measure of variability among raters within the Hierarchical Linear Models
approach. A common distributional assumption in this setting is to specify
hierarchical effects as independent and identically distributed from a normal
with the mean parameter fixed to zero and unknown variance. The present work
aims to overcome this strong assumption in the inter-rater agreement estimation
by placing a Dirichlet Process Mixture over the hierarchical effects' prior
distribution. A new nonparametric index is proposed to quantify
raters polarization in presence of group heterogeneity. The model is applied on
a set of simulated experiments and real world data. Possible future directions
are discussed
Posterior summaries of grocery retail topic models: Evaluation, interpretability and credibility
Understanding the shopping motivations behind market baskets has significant commercial value for the grocery retail industry. The analysis of shopping transactions demands techniques that can cope with the volume and dimensionality of grocery transactional data while delivering interpretable outcomes. Latent Dirichlet allocation (LDA) allows processing grocery transactions and the discovering of customer behaviours. Interpretations of topic models typically exploit individual samples overlooking the uncertainty of single topics. Moreover, training LDA multiple times show topics with large uncertainty, that is, topics (dis)appear in some but not all posterior samples, concurring with various authors in the field. In response, we introduce a clustering methodology that post-processes posterior LDA draws to summarise topic distributions represented as recurrent topics. Our approach identifies clusters of topics that belong to different samples and provides associated measures of uncertainty for each group. Our proposed methodology allows the identification of an unconstrained number of customer behaviours presented as recurrent topics. We also establish a more holistic framework for model evaluation, which assesses topic models based not only on their predictive likelihood but also on quality aspects such as coherence and distinctiveness of single topics and credibility of a set of topics. Using the outcomes of a tailored survey, we set thresholds that aid in interpreting quality aspects in grocery retail data. We demonstrate that selecting recurrent topics not only improves predictive likelihood but also outperforms interpretability and credibility. We illustrate our methods with an example from a large British supermarket chain
Regional Topics in British Grocery Retail Transactions
Understanding the customer behaviours behind transactional data has high commercial value in the grocery retail industry. Customers generate millions of transactions every day, choosing and buying products to satisfy specific shopping needs. Product availability may vary geographically due to local demand and local supply, thus driving the importance of analysing transactions within their corresponding store and regional context. Topic models provide a powerful tool in the analysis of transactional data, identifying topics that display frequently-bought-together products and summarising transactions as mixtures of topics. We use the Segmented Topic Model (STM) to capture customer behaviours that are nested within stores. STM not only provides topics and transaction summaries but also topical summaries at the store level that can be used to identify regional topics. We summarised the posterior distribution of STM by post-processing multiple posterior samples and selecting semantic modes represented as recurrent topics. We use linear Gaussian process regression to model topic prevalence across British territory while accounting for spatial autocorrelation. We implement our methods on a dataset of transactional data from a major UK grocery retailer and demonstrate that shopping behaviours may vary regionally and nearby stores tend to exhibit similar regional demand